This notebook includes preliminary attempts to visualize some basic information about the dataset.

#df <- read.csv('../../Data/indonesia_indicators_time.csv')

Data Messiness

This section details the messiness of our dataset. First, we took a quick look at a few ways that items have been disaggregated.

When we initially made unique measures seperate from one another, we concatenated all of the columns in the dataset having to do iwth disaggregation. Based on a cursory look, these are some of the breakdowns (note that these categories may not be complete). When we could identify that everyone appeared to be included (e.g., ‘ALLREGIONS’ or ‘BOTHSEX’), we did not count these measures as ‘disaggregated.’

## `summarise()` has grouped output by 'Target', 'Indicator', 'SeriesDescription'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'disaggregation'. You can override using the `.groups` argument.

## Warning: 'hctreemap' is deprecated.
## Use 'data_to_hierarchical' instead.
## See help("Deprecated")

This is a bit more of a look at the above disaggregation, wherein we look also at whether measures are disaggregated and how many (per target, subset by goal)

Finally, the following is an example of our current progress (with Indonesia) in terms of how many indicators we have removed for each target / goal.

processedIndo =  read.csv('~/QMSS/G5055_Practicum_Project2/Data/processedIndo.csv')
nrow(processedIndo)
## [1] 4230
processedIndo_No_Disagg = read.csv('~/QMSS/G5055_Practicum_Project2/Data/processedIndo-WITHOUT disaggregation.csv')
nrow(processedIndo_No_Disagg)
## [1] 1809

Guatemala

Also wanted to look at the same with guatemala

## `summarise()` has grouped output by 'Target', 'Indicator', 'SeriesDescription'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'disaggregation'. You can override using the `.groups` argument.

## $tm
##           disaggregation_count vSize vColor stdErr vColorValue level        x0        y0          w         h   color
## 1                 age_sex\n372   372      1    372          NA     1 0.4494949 0.2370031 0.41039655 0.7629969 #FCFBFD
## 2        geographic_region\n99    99      1     99          NA     1 0.4494949 0.0000000 0.35161290 0.2370031 #EFEDF5
## 3 other/not_disaggregated\n534   534      1    534          NA     1 0.0000000 0.0000000 0.44949495 1.0000000 #DADAEB
## 4            raw_material\n127   127      1    127          NA     1 0.8598915 0.2370031 0.14010850 0.7629969 #BCBDDC
## 5                   sector\n42    42      1     42          NA     1 0.8011079 0.0000000 0.14916911 0.2370031 #9E9AC8
## 6                     time\n14    14      1     14          NA     1 0.9502770 0.0000000 0.04972304 0.2370031 #807DBA
## 
## $type
## [1] "index"
## 
## $vSize
## [1] "count"
## 
## $vColor
## [1] NA
## 
## $stdErr
## [1] "count"
## 
## $algorithm
## [1] "pivotSize"
## 
## $vpCoorX
## [1] 0.02812148 0.97187852
## 
## $vpCoorY
## [1] 0.01968504 0.91031496
## 
## $aspRatio
## [1] 1.483512
## 
## $range
## [1] NA
## 
## $mapping
## [1] NA NA NA
## 
## $draw
## [1] TRUE
## Warning: 'hctreemap' is deprecated.
## Use 'data_to_hierarchical' instead.
## See help("Deprecated")

This is a bit more of a look at the above disaggregation, wherein we look also at whether measures are disaggregated and how many (per target, subset by goal)

Measures with only one existent year

## [1] "Missingness Across Time:"

## Weighted degree of each measure TBD